Overview

Dataset statistics

Number of variables19
Number of observations53730
Missing cells305030
Missing cells (%)29.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.4 MiB
Average record size in memory145.0 B

Variable types

NUM9
CAT5
BOOL5

Warnings

type_of_sale has constant value "53730" Constant
subtype_of_property is highly correlated with type_of_propertyHigh correlation
type_of_property is highly correlated with subtype_of_propertyHigh correlation
price has 3330 (6.2%) missing values Missing
nr_of_rooms has 3161 (5.9%) missing values Missing
area has 11267 (21.0%) missing values Missing
equiped_kitchen has 20574 (38.3%) missing values Missing
furnished has 26752 (49.8%) missing values Missing
terrace has 24972 (46.5%) missing values Missing
terrace_area has 36354 (67.7%) missing values Missing
garden has 39390 (73.3%) missing values Missing
garden_area has 45601 (84.9%) missing values Missing
total_land_area has 25353 (47.2%) missing values Missing
nr_of_facades has 18256 (34.0%) missing values Missing
swimming_pool has 32150 (59.8%) missing values Missing
building_condition has 17870 (33.3%) missing values Missing
price is highly skewed (γ1 = 22.2881236) Skewed
nr_of_rooms is highly skewed (γ1 = 36.68169628) Skewed
terrace_area is highly skewed (γ1 = 76.34372105) Skewed
garden_area is highly skewed (γ1 = 71.7730675) Skewed
total_land_area is highly skewed (γ1 = 55.12911822) Skewed
id has unique values Unique
nr_of_rooms has 1206 (2.2%) zeros Zeros
total_land_area has 3514 (6.5%) zeros Zeros

Reproduction

Analysis started2020-12-07 05:38:36.651730
Analysis finished2020-12-07 05:39:08.699758
Duration32.05 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

id
Real number (ℝ≥0)

UNIQUE

Distinct53730
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8832927.854
Minimum1882546
Maximum9066628
Zeros0
Zeros (%)0.0%
Memory size419.8 KiB
2020-12-07T06:39:08.932215image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1882546
5-th percentile8182803.45
Q18801772
median8948633.5
Q39017599.75
95-th percentile9057278.55
Maximum9066628
Range7184082
Interquartile range (IQR)215827.75

Descriptive statistics

Standard deviation351575.5252
Coefficient of variation (CV)0.03980282993
Kurtosis36.28978961
Mean8832927.854
Median Absolute Deviation (MAD)84684
Skewness-4.526475785
Sum4.745932136e+11
Variance1.236053499e+11
MonotocityNot monotonic
2020-12-07T06:39:09.176466image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
87859181< 0.1%
 
88999481< 0.1%
 
90125171< 0.1%
 
90145641< 0.1%
 
90248031< 0.1%
 
90268501< 0.1%
 
88896331< 0.1%
 
88916801< 0.1%
 
88077051< 0.1%
 
88097521< 0.1%
 
Other values (53720)53720> 99.9%
 
ValueCountFrequency (%) 
18825461< 0.1%
 
23357391< 0.1%
 
27849381< 0.1%
 
30011351< 0.1%
 
37028391< 0.1%
 
ValueCountFrequency (%) 
90666281< 0.1%
 
90665561< 0.1%
 
90665271< 0.1%
 
90664171< 0.1%
 
90663991< 0.1%
 

locality
Real number (ℝ≥0)

Distinct1051
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5372.749079
Minimum1000
Maximum9992
Zeros0
Zeros (%)0.0%
Memory size419.8 KiB
2020-12-07T06:39:09.647895image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1000
5-th percentile1070
Q12260
median5580
Q38450
95-th percentile9550
Maximum9992
Range8992
Interquartile range (IQR)6190

Descriptive statistics

Standard deviation3111.313225
Coefficient of variation (CV)0.5790914817
Kurtosis-1.617409373
Mean5372.749079
Median Absolute Deviation (MAD)3040
Skewness-0.03945887581
Sum288677808
Variance9680269.981
MonotocityNot monotonic
2020-12-07T06:39:09.877559image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
830012682.4%
 
90009101.7%
 
11808421.6%
 
84008141.5%
 
10008091.5%
 
20006171.1%
 
10505821.1%
 
83705511.0%
 
10705201.0%
 
86704720.9%
 
Other values (1041)4634586.3%
 
ValueCountFrequency (%) 
10008091.5%
 
10201720.3%
 
10304180.8%
 
10402170.4%
 
10505821.1%
 
ValueCountFrequency (%) 
99927< 0.1%
 
99911030.2%
 
99901080.2%
 
998812< 0.1%
 
998211< 0.1%
 

type_of_property
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size419.8 KiB
HOUSE
28378 
APARTMENT
22194 
APARTMENT_GROUP
 
2341
HOUSE_GROUP
 
817
ValueCountFrequency (%) 
HOUSE2837852.8%
 
APARTMENT2219441.3%
 
APARTMENT_GROUP23414.4%
 
HOUSE_GROUP8171.5%
 
2020-12-07T06:39:10.086371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-07T06:39:10.220721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:10.371171image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length15
Median length5
Mean length7.179192258
Min length5

subtype_of_property
Categorical

HIGH CORRELATION

Distinct25
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size419.8 KiB
HOUSE
20685 
APARTMENT
16956 
VILLA
2954 
APARTMENT_GROUP
2341 
DUPLEX
 
1391
Other values (20)
9403 
ValueCountFrequency (%) 
HOUSE2068538.5%
 
APARTMENT1695631.6%
 
VILLA29545.5%
 
APARTMENT_GROUP23414.4%
 
DUPLEX13912.6%
 
APARTMENT_BLOCK12442.3%
 
GROUND_FLOOR11842.2%
 
PENTHOUSE10271.9%
 
MIXED_USE_BUILDING10081.9%
 
HOUSE_GROUP8171.5%
 
Other values (15)41237.7%
 
2020-12-07T06:39:10.564992image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-07T06:39:10.792626image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length20
Median length9
Mean length7.981313977
Min length3

price
Real number (ℝ≥0)

MISSING
SKEWED

Distinct4640
Distinct (%)9.2%
Missing3330
Missing (%)6.2%
Infinite0
Infinite (%)0.0%
Mean396311.4335
Minimum1
Maximum35000000
Zeros0
Zeros (%)0.0%
Memory size419.8 KiB
2020-12-07T06:39:11.026234image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile115000
Q1200000
median285000
Q3405000
95-th percentile995000
Maximum35000000
Range34999999
Interquartile range (IQR)205000

Descriptive statistics

Standard deviation523834.5236
Coefficient of variation (CV)1.321774946
Kurtosis1224.348396
Mean396311.4335
Median Absolute Deviation (MAD)95000
Skewness22.2881236
Sum1.997409625e+10
Variance2.744026081e+11
MonotocityNot monotonic
2020-12-07T06:39:11.265762image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2490006701.2%
 
2990006221.2%
 
1990006221.2%
 
2750005951.1%
 
2950005781.1%
 
2250005441.0%
 
3950005050.9%
 
1950005030.9%
 
1750004940.9%
 
2350004730.9%
 
Other values (4630)4479483.4%
 
(Missing)33306.2%
 
ValueCountFrequency (%) 
11< 0.1%
 
651< 0.1%
 
25003< 0.1%
 
35001< 0.1%
 
40001< 0.1%
 
ValueCountFrequency (%) 
350000003< 0.1%
 
216000001< 0.1%
 
135000002< 0.1%
 
95000001< 0.1%
 
87500001< 0.1%
 

type_of_sale
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size419.8 KiB
FOR_SALE
53730 
ValueCountFrequency (%) 
FOR_SALE53730100.0%
 
2020-12-07T06:39:11.502802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-07T06:39:11.636525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:11.766936image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length8
Min length8

nr_of_rooms
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct45
Distinct (%)0.1%
Missing3161
Missing (%)5.9%
Infinite0
Infinite (%)0.0%
Mean2.92669422
Minimum0
Maximum204
Zeros1206
Zeros (%)2.2%
Memory size419.8 KiB
2020-12-07T06:39:11.989550image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile5
Maximum204
Range204
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.606963675
Coefficient of variation (CV)0.8907536896
Kurtosis2493.272403
Mean2.92669422
Median Absolute Deviation (MAD)1
Skewness36.68169628
Sum148000
Variance6.7962596
MonotocityNot monotonic
2020-12-07T06:39:12.260279image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%) 
31665031.0%
 
21546228.8%
 
4715413.3%
 
150029.3%
 
527545.1%
 
012062.2%
 
611622.2%
 
74350.8%
 
82520.5%
 
91240.2%
 
Other values (35)3680.7%
 
(Missing)31615.9%
 
ValueCountFrequency (%) 
012062.2%
 
150029.3%
 
21546228.8%
 
31665031.0%
 
4715413.3%
 
ValueCountFrequency (%) 
2043< 0.1%
 
1651< 0.1%
 
804< 0.1%
 
711< 0.1%
 
701< 0.1%
 

area
Real number (ℝ≥0)

MISSING

Distinct822
Distinct (%)1.9%
Missing11267
Missing (%)21.0%
Infinite0
Infinite (%)0.0%
Mean170.7482279
Minimum1
Maximum11366
Zeros0
Zeros (%)0.0%
Memory size419.8 KiB
2020-12-07T06:39:12.736919image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile57
Q192
median132
Q3200
95-th percentile404
Maximum11366
Range11365
Interquartile range (IQR)108

Descriptive statistics

Standard deviation170.3762744
Coefficient of variation (CV)0.9978216261
Kurtosis815.4907766
Mean170.7482279
Median Absolute Deviation (MAD)47
Skewness17.71983869
Sum7250482
Variance29028.07487
MonotocityNot monotonic
2020-12-07T06:39:13.117732image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1008731.6%
 
908431.6%
 
1208161.5%
 
1507571.4%
 
1107061.3%
 
806961.3%
 
1406681.2%
 
2006511.2%
 
856311.2%
 
1606141.1%
 
Other values (812)3520865.5%
 
(Missing)1126721.0%
 
ValueCountFrequency (%) 
12< 0.1%
 
52< 0.1%
 
131< 0.1%
 
141< 0.1%
 
153< 0.1%
 
ValueCountFrequency (%) 
113661< 0.1%
 
87501< 0.1%
 
85211< 0.1%
 
62931< 0.1%
 
43801< 0.1%
 

equiped_kitchen
Categorical

MISSING

Distinct8
Distinct (%)< 0.1%
Missing20574
Missing (%)38.3%
Memory size419.8 KiB
INSTALLED
17750 
HYPER_EQUIPPED
6703 
SEMI_EQUIPPED
3537 
USA_HYPER_EQUIPPED
2138 
NOT_INSTALLED
1917 
Other values (3)
 
1111
ValueCountFrequency (%) 
INSTALLED1775033.0%
 
HYPER_EQUIPPED670312.5%
 
SEMI_EQUIPPED35376.6%
 
USA_HYPER_EQUIPPED21384.0%
 
NOT_INSTALLED19173.6%
 
USA_INSTALLED8871.7%
 
USA_SEMI_EQUIPPED1970.4%
 
USA_UNINSTALLED270.1%
 
(Missing)2057438.3%
 
2020-12-07T06:39:13.422031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-07T06:39:13.569031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:13.848771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length18
Median length9
Mean length8.188814443
Min length3

furnished
Boolean

MISSING

Distinct2
Distinct (%)< 0.1%
Missing26752
Missing (%)49.8%
Memory size419.8 KiB
False
25398 
True
 
1580
(Missing)
26752 
ValueCountFrequency (%) 
False2539847.3%
 
True15802.9%
 
(Missing)2675249.8%
 
2020-12-07T06:39:14.012038image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

open_fire
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
False
51233 
True
 
2497
ValueCountFrequency (%) 
False5123395.4%
 
True24974.6%
 
2020-12-07T06:39:14.098656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

terrace
Boolean

MISSING

Distinct1
Distinct (%)< 0.1%
Missing24972
Missing (%)46.5%
Memory size419.8 KiB
True
28758 
(Missing)
24972 
ValueCountFrequency (%) 
True2875853.5%
 
(Missing)2497246.5%
 
2020-12-07T06:39:14.206633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

terrace_area
Real number (ℝ≥0)

MISSING
SKEWED

Distinct213
Distinct (%)1.2%
Missing36354
Missing (%)67.7%
Infinite0
Infinite (%)0.0%
Mean29.13092772
Minimum1
Maximum20000
Zeros0
Zeros (%)0.0%
Memory size419.8 KiB
2020-12-07T06:39:14.394817image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q19
median16
Q330
95-th percentile78
Maximum20000
Range19999
Interquartile range (IQR)21

Descriptive statistics

Standard deviation191.8688313
Coefficient of variation (CV)6.586430516
Kurtosis7174.296533
Mean29.13092772
Median Absolute Deviation (MAD)9
Skewness76.34372105
Sum506179
Variance36813.64841
MonotocityNot monotonic
2020-12-07T06:39:14.643487image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1010331.9%
 
2010051.9%
 
158091.5%
 
67841.5%
 
127491.4%
 
87261.4%
 
306671.2%
 
95941.1%
 
255701.1%
 
55631.0%
 
Other values (203)987618.4%
 
(Missing)3635467.7%
 
ValueCountFrequency (%) 
1700.1%
 
23130.6%
 
33760.7%
 
45511.0%
 
55631.0%
 
ValueCountFrequency (%) 
200001< 0.1%
 
80002< 0.1%
 
60001< 0.1%
 
35001< 0.1%
 
34001< 0.1%
 

garden
Boolean

MISSING

Distinct1
Distinct (%)< 0.1%
Missing39390
Missing (%)73.3%
Memory size419.8 KiB
True
14340 
(Missing)
39390 
ValueCountFrequency (%) 
True1434026.7%
 
(Missing)3939073.3%
 
2020-12-07T06:39:14.805643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

garden_area
Real number (ℝ≥0)

MISSING
SKEWED

Distinct1192
Distinct (%)14.7%
Missing45601
Missing (%)84.9%
Infinite0
Infinite (%)0.0%
Mean1093.024972
Minimum1
Maximum1134500
Zeros0
Zeros (%)0.0%
Memory size419.8 KiB
2020-12-07T06:39:14.948746image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile20
Q170
median185
Q3591
95-th percentile2869.2
Maximum1134500
Range1134499
Interquartile range (IQR)521

Descriptive statistics

Standard deviation13683.43463
Coefficient of variation (CV)12.51886734
Kurtosis5827.776013
Mean1093.024972
Median Absolute Deviation (MAD)145
Skewness71.7730675
Sum8885200
Variance187236383.2
MonotocityNot monotonic
2020-12-07T06:39:15.222049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1002550.5%
 
502010.4%
 
2001880.3%
 
3001480.3%
 
4001330.2%
 
601280.2%
 
801280.2%
 
1501280.2%
 
2501230.2%
 
401230.2%
 
Other values (1182)657412.2%
 
(Missing)4560184.9%
 
ValueCountFrequency (%) 
1680.1%
 
21< 0.1%
 
33< 0.1%
 
415< 0.1%
 
512< 0.1%
 
ValueCountFrequency (%) 
11345001< 0.1%
 
3126001< 0.1%
 
1100001< 0.1%
 
940001< 0.1%
 
809781< 0.1%
 

total_land_area
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct3393
Distinct (%)12.0%
Missing25353
Missing (%)47.2%
Infinite0
Infinite (%)0.0%
Mean1261.731015
Minimum0
Maximum850000
Zeros3514
Zeros (%)6.5%
Memory size419.8 KiB
2020-12-07T06:39:15.510242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1140
median350
Q3838
95-th percentile3490.4
Maximum850000
Range850000
Interquartile range (IQR)698

Descriptive statistics

Standard deviation7884.917281
Coefficient of variation (CV)6.249285458
Kurtosis5086.635899
Mean1261.731015
Median Absolute Deviation (MAD)268
Skewness55.12911822
Sum35804141
Variance62171920.53
MonotocityNot monotonic
2020-12-07T06:39:15.802571image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
035146.5%
 
1502160.4%
 
1002140.4%
 
2001780.3%
 
10001730.3%
 
1201720.3%
 
3001690.3%
 
2501650.3%
 
1801490.3%
 
5001390.3%
 
Other values (3383)2328843.3%
 
(Missing)2535347.2%
 
ValueCountFrequency (%) 
035146.5%
 
124< 0.1%
 
22< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
8500001< 0.1%
 
3963001< 0.1%
 
2500001< 0.1%
 
2269521< 0.1%
 
2200001< 0.1%
 

nr_of_facades
Real number (ℝ≥0)

MISSING

Distinct7
Distinct (%)< 0.1%
Missing18256
Missing (%)34.0%
Infinite0
Infinite (%)0.0%
Mean2.76450358
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size419.8 KiB
2020-12-07T06:39:16.047731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q12
median3
Q34
95-th percentile4
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation0.8632200106
Coefficient of variation (CV)0.3122513629
Kurtosis-1.225925341
Mean2.76450358
Median Absolute Deviation (MAD)1
Skewness0.3848056267
Sum98068
Variance0.7451487866
MonotocityNot monotonic
2020-12-07T06:39:16.209069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
21716732.0%
 
4959117.9%
 
3831815.5%
 
13950.7%
 
61< 0.1%
 
51< 0.1%
 
101< 0.1%
 
(Missing)1825634.0%
 
ValueCountFrequency (%) 
13950.7%
 
21716732.0%
 
3831815.5%
 
4959117.9%
 
51< 0.1%
 
ValueCountFrequency (%) 
101< 0.1%
 
61< 0.1%
 
51< 0.1%
 
4959117.9%
 
3831815.5%
 

swimming_pool
Boolean

MISSING

Distinct2
Distinct (%)< 0.1%
Missing32150
Missing (%)59.8%
Memory size419.8 KiB
False
20448 
True
 
1132
(Missing)
32150 
ValueCountFrequency (%) 
False2044838.1%
 
True11322.1%
 
(Missing)3215059.8%
 
2020-12-07T06:39:16.671174image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

building_condition
Categorical

MISSING

Distinct6
Distinct (%)< 0.1%
Missing17870
Missing (%)33.3%
Memory size419.8 KiB
AS_NEW
14001 
GOOD
12812 
TO_BE_DONE_UP
3199 
TO_RENOVATE
3128 
JUST_RENOVATED
2524 
ValueCountFrequency (%) 
AS_NEW1400126.1%
 
GOOD1281223.8%
 
TO_BE_DONE_UP31996.0%
 
TO_RENOVATE31285.8%
 
JUST_RENOVATED25244.7%
 
TO_RESTORE1960.4%
 
(Missing)1787033.3%
 
2020-12-07T06:39:16.830570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-07T06:39:17.002524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:17.208706image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length14
Median length4
Mean length5.623580867
Min length3

Interactions

2020-12-07T06:38:45.811326image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:46.078232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:46.289358image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:46.499801image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:46.729028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:46.967025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:47.292615image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:47.580851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:47.895876image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:48.130745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:48.341309image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:48.547033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:48.791971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:48.996099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:49.179015image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:49.385658image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:49.591652image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:49.796146image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:50.175199image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:50.509766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:50.849603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:51.520726image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:51.741147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:51.983810image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:52.246773image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:52.546493image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:52.778048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:52.998950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:53.215699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:53.428328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:53.640985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:53.871868image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:54.105764image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:54.330730image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:54.531152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:54.738941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:54.961179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:55.185683image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:55.389227image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:55.596777image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:55.846745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:56.039805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:56.250134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:56.456781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:56.661796image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:56.889238image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:57.102167image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:57.305695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:57.527835image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:57.754870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:57.993795image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:58.299460image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:58.544540image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:58.922921image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:59.173136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:59.560084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:38:59.768912image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:00.028462image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:00.230808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:00.426544image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:00.626192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:00.838711image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:01.039017image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:01.259824image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:01.486528image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:01.769561image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:02.004229image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:02.191731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:02.379795image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:02.566503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:02.758492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:02.939006image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:03.137827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:03.423134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:03.657945image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:03.874547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:04.085700image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:04.457331image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:04.680610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:04.937030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:05.169944image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-12-07T06:39:17.428122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-07T06:39:17.764948image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-07T06:39:18.103677image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-07T06:39:18.490799image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-12-07T06:39:18.858633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-12-07T06:39:05.759092image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:06.651585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:07.714649image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-07T06:39:08.363787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

idlocalitytype_of_propertysubtype_of_propertypricetype_of_salenr_of_roomsareaequiped_kitchenfurnishedopen_fireterraceterrace_areagardengarden_areatotal_land_areanr_of_facadesswimming_poolbuilding_condition
090440811083APARTMENTAPARTMENT265000.0FOR_SALE4.090.0INSTALLEDFalseFalseTrue13.0NaNNaNNaN4.0NaNAS_NEW
190439781000APARTMENTAPARTMENT1795000.0FOR_SALE4.0650.0USA_HYPER_EQUIPPEDFalseTrueTrue400.0NaNNaNNaN3.0NaNAS_NEW
290441881050HOUSEMANSION3800000.0FOR_SALE5.0752.0HYPER_EQUIPPEDFalseFalseTrue40.0TrueNaN340.02.0NaNJUST_RENOVATED
390410954860HOUSEHOUSE320000.0FOR_SALE5.0231.0NOT_INSTALLEDFalseFalseTrue30.0True1200.01421.03.0FalseAS_NEW
490421751160APARTMENT_GROUPAPARTMENT_GROUPNaNFOR_SALENaNNaNNaNNaNFalseNaNNaNNaNNaNNaNNaNNaNNaN
590410986001APARTMENT_GROUPAPARTMENT_GROUPNaNFOR_SALENaNNaNNaNNaNFalseNaNNaNNaNNaNNaNNaNNaNNaN
690430369600APARTMENTAPARTMENT195000.0FOR_SALE2.075.0INSTALLEDNaNFalseNaNNaNNaNNaNNaN2.0NaNGOOD
790429506010APARTMENTTRIPLEX235000.0FOR_SALE3.0149.0HYPER_EQUIPPEDFalseFalseTrue15.0NaNNaNNaN2.0FalseAS_NEW
890420731070APARTMENTAPARTMENT320000.0FOR_SALE3.0130.0USA_HYPER_EQUIPPEDFalseFalseTrue14.0NaNNaNNaN2.0NaNAS_NEW
990422677181HOUSEVILLA325000.0FOR_SALE2.0130.0INSTALLEDFalseFalseTrue30.0True600.01043.04.0FalseTO_BE_DONE_UP

Last rows

idlocalitytype_of_propertysubtype_of_propertypricetype_of_salenr_of_roomsareaequiped_kitchenfurnishedopen_fireterraceterrace_areagardengarden_areatotal_land_areanr_of_facadesswimming_poolbuilding_condition
5372090200733020HOUSEHOUSE411217.0FOR_SALE3.0NaNNaNNaNFalseNaNNaNNaNNaN1000.0NaNFalseNaN
5372189680053040HOUSEHOUSE413119.0FOR_SALE3.0NaNNaNNaNFalseNaNNaNNaNNaN937.0NaNFalseNaN
5372287032673140HOUSEHOUSE419322.0FOR_SALE3.0NaNNaNNaNFalseNaNNaNNaNNaN730.0NaNFalseNaN
5372390274233020HOUSEHOUSE442072.0FOR_SALE3.0NaNNaNNaNFalseNaNNaNNaNNaN1000.0NaNFalseNaN
5372487141171860HOUSE_GROUPHOUSE_GROUPNaNFOR_SALENaNNaNNaNFalseFalseNaNNaNNaNNaNNaNNaNNaNNaN
5372586675133080HOUSE_GROUPHOUSE_GROUPNaNFOR_SALENaNNaNNaNNaNFalseNaNNaNNaNNaNNaNNaNNaNNaN
5372690208401785HOUSEHOUSE417500.0FOR_SALE3.0NaNNaNNaNFalseNaNNaNNaNNaN396.03.0NaNNaN
5372787471543470HOUSECOUNTRY_COTTAGE750000.0FOR_SALE3.0NaNNaNFalseFalseNaNNaNNaNNaN0.0NaNFalseNaN
5372869925731500APARTMENT_GROUPAPARTMENT_GROUPNaNFOR_SALENaNNaNNaNNaNFalseNaNNaNNaNNaNNaNNaNNaNNaN
5372969187661500APARTMENT_GROUPAPARTMENT_GROUPNaNFOR_SALENaNNaNNaNNaNFalseNaNNaNNaNNaNNaNNaNNaNNaN